39 research outputs found
k-Anonymity in the Presence of External Databases
The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (P D) that can be used to breach privacy, none utilizes P D during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization to reduce the information loss. Briefly, KJA anonymizes a superset of MT, which includes selected records from P D. We propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and P D, under the constraint that each group should contain at least one tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then refines the resulting groups using P D. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic datasets
TokenJoin:Efficient Filtering for Set Similarity Join with MaximumWeighted Bipartite Matching
Set similarity join is an important problem with many applications in data discovery, cleaning and integration. To increase robustness, fuzzy set similarity join calculates the similarity of two sets based on maximum weighted bipartite matching instead of set overlap. This allows pairs of elements, represented as sets or strings, to also match approximately rather than exactly, e.g., based on Jaccard similarity or edit distance. However, this significantly increases the verification cost, making even more important the need for efficient and effective filtering techniques to reduce the number of candidate pairs. The current state-of-the-art algorithm relies on similarity computations between pairs of elements to filter candidates. In this paper, we propose token-based instead of element-based filtering, showing that it is significantly more lightweight, while offering similar or even better pruning effectiveness. Moreover, we address the top-k variant of the problem, alleviating the need for a userspecified similarity threshold. We also propose early termination to reduce the cost of verification. Our experimental results on six real-world datasets show that our approach always outperforms the state of the art, being an order of magnitude faster on average.</p
Fairness Aware Counterfactuals for Subgroups
In this work, we present Fairness Aware Counterfactuals for Subgroups
(FACTS), a framework for auditing subgroup fairness through counterfactual
explanations. We start with revisiting (and generalizing) existing notions and
introducing new, more refined notions of subgroup fairness. We aim to (a)
formulate different aspects of the difficulty of individuals in certain
subgroups to achieve recourse, i.e. receive the desired outcome, either at the
micro level, considering members of the subgroup individually, or at the macro
level, considering the subgroup as a whole, and (b) introduce notions of
subgroup fairness that are robust, if not totally oblivious, to the cost of
achieving recourse. We accompany these notions with an efficient,
model-agnostic, highly parameterizable, and explainable framework for
evaluating subgroup fairness. We demonstrate the advantages, the wide
applicability, and the efficiency of our approach through a thorough
experimental evaluation of different benchmark datasets
Interactivity, Fairness and Explanations in Recommendations
More and more aspects of our everyday lives are influenced by automated decisions made by systems that statistically analyze traces of our activities. It is thus natural to question whether such systems are trustworthy, particularly given the opaqueness and complexity of their internal workings. In this paper, we present our ongoing work towards a framework that aims to increase trust in machine-generated recommendations by combining ideas from three separate recent research directions, namely explainability, fairness and user interactive visualization. The goal is to enable different stakeholders, with potentially varying levels of background and diverse needs, to query, understand, and fix sources of distrust.acceptedVersionPeer reviewe
Recommended from our members
New Trends in Scientific Knowledge Graphs and Research Impact Assessment
On-Line Discovery of Hot Motion Paths
We consider an environment of numerous moving objects, equipped with location-sensing devices and capable of communicating with a central coordinator. In this setting, we investigate the problem of maintaining hot motion paths, i.e., routes frequently followed by multiple objects over the recent past. Motion paths approximate portions of objects' movement within a tolerance margin that depends on the uncertainty inherent in positional measurements. Discovery of hot motion paths is important to applications requiring classification/profiling based on monitored movement patterns, such as targeted advertising, resource allocation, etc. To achieve this goal, we delegate part of the path extraction process to objects, by assigning to them adaptive lightweight filters that dynamically suppress unnecessary location updates and, thus, help reducing the communication overhead. We demonstrate the benefits of our methods and their efficiency through extensive experiments on synthetic data sets
Databases and Information Systems in the AI Era: Contributions from ADBIS, TPDL and EDA 2020 Workshops and Doctoral Consortium
Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organization; (ii) new programming models, (iii) processing power and (iv) new applications that emerge related to health, environment, education, Cultural Heritage, Banking, etc. The 24th East-European Conference on Advances in Databases and Information Systems (ADBIS 2020), the 24th International Conference on Theory and Practice of Digital Libraries (TPDL 2020) and the 16th Workshop on Business Intelligence and Big Data (EDA 2020), held during August 25–27, 2020, at Lyon, France, and associated satellite events aimed at covering some emerging issues related to database and information system research in these areas. The aim of this paper is to present such events, their motivations, and topics of interest, as well as briefly outline the papers selected for presentations. The selected papers will then be included in the remainder of this volume
Towards Mobility Data Science (Vision Paper)
Mobility data captures the locations of moving objects such as humans,
animals, and cars. With the availability of GPS-equipped mobile devices and
other inexpensive location-tracking technologies, mobility data is collected
ubiquitously. In recent years, the use of mobility data has demonstrated
significant impact in various domains including traffic management, urban
planning, and health sciences. In this paper, we present the emerging domain of
mobility data science. Towards a unified approach to mobility data science, we
envision a pipeline having the following components: mobility data collection,
cleaning, analysis, management, and privacy. For each of these components, we
explain how mobility data science differs from general data science, we survey
the current state of the art and describe open challenges for the research
community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from
the metadata. PDF has not been change